Efficient Group K Nearest-Neighbor Spatial Query Processing in Apache Spark
نویسندگان
چکیده
Aiming at the problem of spatial query processing in distributed computing systems, design and implementation new algorithms is a current challenge. Apache Spark memory-based framework suitable for real-time batch processing. Spark-based systems allow users to work on in-memory data, without worrying about data distribution mechanism fault-tolerance. Given two datasets points (called Query Training), group K nearest-neighbor (GKNN) retrieves (K) Training with smallest sum distances every point Query. This has been actively studied centralized environments several performance improving techniques pruning heuristics have also proposed, while, algorithm Hadoop was recently proposed by our team. Since, general, exhibits lower than Spark, this paper, we present first GKNN compare it against one Hadoop. incorporates programming features facilities that are specific Spark. Moreover, improve applicable incorporated. The results an extensive set experiments real-world presented, demonstrating solution, its improvements, efficient clear winner comparison
منابع مشابه
Multiple k Nearest Neighbor Query Processing in Spatial Network Databases
This paper concerns the efficient processing of multiple k nearest neighbor queries in a road-network setting. The assumed setting covers a range of scenarios such as the one where a large population of mobile service users that are constrained to a road network issue nearest-neighbor queries for points of interest that are accessible via the road network. Given multiple k nearest neighbor quer...
متن کاملOn efficient mutual nearest neighbor query processing in spatial databases
This paper studies a new form of nearest neighbor queries in spatial databases, namely, mutual nearest neighbor (MNN) search. Given a set D of objects and a query object q, an MNN query returns from D, the set of objects that are among the k 1 (P1) nearest neighbors (NNs) of q; meanwhile, have q as one of their k 2 (P1) NNs. Although MNN queries are useful in many applications involving decisio...
متن کاملSPARQL query processing with Apache Spark
The number and the size of linked open data graphs keep growing at a fast pace and confronts semantic RDF services with problems characterized as Big data. Distributed query processing is one of them and needs to be efficiently addressed with execution guaranteeing scalability, high availability and fault tolerance. RDF data management systems requiring these properties are rarely built from sc...
متن کاملA Review of various k-Nearest Neighbor Query Processing Techniques
Identifying the queried object, from a large volume of given uncertain dataset, is a tedious task which involves time complexity and computational complexity. To solve these complexities, various research techniques were proposed. Among these, the simple, highly efficient and effective technique is, finding the K-Nearest Neighbor (kNN) algorithm. It is a technique which has applications in vari...
متن کاملEfficient mutual nearest neighbor query processing for moving object trajectories
Given a set D of trajectories, a query object q, and a query time extent C, amutual (i.e., symmetric) nearest neighbor (MNN) query over trajectories finds from D, the set of trajectories that are among the k1 nearest neighbors (NNs) of q within C, and meanwhile, have q as one of their k2 NNs. This type of queries is useful inmany applications such as decisionmaking, data mining, and pattern rec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ISPRS international journal of geo-information
سال: 2021
ISSN: ['2220-9964']
DOI: https://doi.org/10.3390/ijgi10110763